Add PTX vector memory intrinsics by ilehtoranta · Pull Request #4 · LostBeard/SpawnDev.ILGPU

ilehtoranta · 2026-05-07T08:50:24Z

Summary

Adds PTX-only vector memory intrinsics for explicit f32 vector load/store code generation.

This introduces:

PTXMemory.LoadF32x2 / StoreF32x2
PTXMemory.LoadF32x4 / StoreF32x4
Float2 and Float4 helper structs
intrinsic registration in the PTX algorithms context
aligned/vectorized ArrayView convenience helpers

The main use case is CUDA kernels that need predictable vector memory instructions instead of relying on backend inference from ordinary scalar or struct access patterns.

Details

The new PTX intrinsics generate explicit PTX vector memory operations:

ld.v2.f32
st.v2.f32
ld.v4.f32
st.v4.f32

For f32x4, ptxas can lower these to 128-bit global memory instructions such as LD.E.128 and ST.E.128 when alignment and addressing are suitable.

This is useful for performance-sensitive kernels that operate on adjacent float values.

LostBeard · 2026-05-10T19:16:18Z

Awesome! I'll take a look asap.

ilehtoranta · 2026-05-17T12:40:49Z

Is there anything I could help with? I mean, adding more tests, for example?

- Add PTXMemory class (ILGPU.Algorithms.PTX) with ld.v2/v4.f32 and st.v2/v4.f32 intrinsics; Float2/Float4 structs - Add ArrayView LoadVectorized/StoreVectorized/CastAligned extension helpers - Revert CudaAccelerator.DefaultMaxRegistersPerThread default from 255 to 0 (restores occupancy on normal kernels) - Remap System.Numerics.BitOperations to hardware-backed IntrinsicMath methods (CLZ/PopC/CTZ) - Add CUDA-only unit tests for all new PTX vector memory variants - Bump ILGPU/ILGPU.Algorithms fork to 2.0.7; SpawnDev.ILGPU to 4.9.6-local.1 Addresses ilehtoranta Discussion #5 and PR #4. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

LostBeard · 2026-05-22T05:08:36Z

Merged in commit 2ec94d6, shipping in 4.9.6 (currently published to my local feed as 4.9.6-local.1; rc to nuget.org shortly).

Applied changes:

PTXMemory class with LoadF32x2, LoadF32x4, StoreF32x2, StoreF32x4 (ref + struct + scalar-argument forms)
Float2 and Float4 readonly structs
PTXContext.cs RegisterMemoryIntrinsics() + per-intrinsic registration helper
PTX code generators: GenerateLoad, GenerateStore, GenerateStoreScalars
PTXContext.Generated.tt + .cs wired up
ArrayView<T>.LoadVectorized/StoreVectorized/CastAligned extension helpers + ArrayView1D<T,Dense> overloads

The RemappedIntrinsics.cs change to retarget System.Numerics.BitOperations at the hardware-backed IntrinsicMath methods (with [MathIntrinsic(CLZ/PopC/CTZ)]) is a real cross-backend win — it now emits popc/clz/ctz (PTX), popcount/clz/ctz (OpenCL), countOneBits/countLeadingZeros/countTrailingZeros (WebGPU), i32.popcnt/clz/ctz (Wasm), and a parallel-bit-count path on WebGL, instead of compiling the C# software fallback on every backend. Nice catch.

Added CUDA-only unit tests covering all five variants (ld.v2.f32, ld.v4.f32, st.v2.f32 from struct, st.v2/v4.f32 from scalar args, and the LoadVectorized<Float2> ArrayView path) — all passing under PMT (Tests_PTXVectorMemory_F32x2_LoadStore, Tests_PTXVectorMemory_F32x4_LoadStore, Tests_PTXVectorMemory_F32x2_StoreScalars, Tests_PTXVectorMemory_F32x4_StoreScalars, Tests_ArrayView_LoadVectorized_Float2).

Closing as manually applied. Thank you for the well-structured contribution — the PTX code generators followed the existing ILGPU pattern exactly, easy to drop in.

ilehtoranta · 2026-05-22T05:39:45Z

Thanks! The AI made this easy =)

Add PTX vector memory intrinsics

8ddce05

LostBeard self-assigned this May 10, 2026

LostBeard closed this May 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add PTX vector memory intrinsics#4

Add PTX vector memory intrinsics#4
ilehtoranta wants to merge 1 commit into
LostBeard:masterfrom
ilehtoranta:codex/ptx-vector-memory-intrinsics

ilehtoranta commented May 7, 2026

Uh oh!

LostBeard commented May 10, 2026

Uh oh!

ilehtoranta commented May 17, 2026

Uh oh!

LostBeard commented May 22, 2026

Uh oh!

ilehtoranta commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ilehtoranta commented May 7, 2026

Summary

Details

Uh oh!

LostBeard commented May 10, 2026

Uh oh!

ilehtoranta commented May 17, 2026

Uh oh!

LostBeard commented May 22, 2026

Uh oh!

ilehtoranta commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants